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1.   INTRODUCTION 

Recently  algorithms  have  "been  developed  for  the  simultaneous  fitting 
of  several  exponential  decay  curves  with  common  exponential  factor  [l,  2]. 

In  many  physical  situations,  however,  the  curves  need  to  be  fitted  by 

,  n      ,     . 
multi-term  (  £  a.exp(b.tj)  exponential  sums  instead  of  by  a  single  expo- 
i=l  x     x 

nential  (n=l).   Before  simultaneous  curve  fitting  can  be  done,  good  al- 
gorithms must  exist  for  the  approximation  of  a  single  curve  by  the  func- 
tional form  under  consideration.   Uniform  curve  fitting  by  sums  of  expo- 
nentials has  been  studied  to  some  extent  [3,  *+],  but  there  remains  a  need 
for  further  development  of  efficient,  reliable  algorithms. 

A  method  which  has  been  quite  useful  for  the  construction  of  best 
rational  approximations  is  the  differential  correction  method.   This 
method  has  been  studied  in  some  detail  [5]  and  has  recently  been  success- 
fully combined  with  linear  programming  to  form  an  efficient  algorithm  [6]. 
The  combination  would  also  appear  to  be  promising  for  exponential  ap- 
proximation.  Indeed  Braess  [k]   has  discussed  a  differential  correction 
approach  to  approximation  by  exponential  sums;  he  does  not,  however,  seem 
to  combine  the  method  with  linear  programming. 

In  this  paper  we  report  on  a  combined  differential-correction  plus 
linear-programming  algorithm  for  approximation  by  sums  of  exponentials. 
The  next  sections  contain  a  brief  discussion  of  the  algorithm,  while  in 
Section  k   we  report  on  some  computer  tests. 


2.   SOME  THEORETICAL  CONSIDERATIONS 

The  problem  to  be  solved  is  the  following.   Let  g(t)  be  a  given 

function  defined  on  a  finite  point  set  T  =  {t.}.  _  CT  [0,°°).   Let  F   be 

1   i=0  '  n 

the   family  of  n-term  exponential   sums  with  nonnegative  coefficients: 

n 

F        =   {      T  a.exp(b.t)    :    a.    >  0,   b.e  R}. 
n  .  ii_    i  l  l—i 

i=l 

We  wish  to  determine  a  best  approximation  f  from  F   to  g  on  T,  where 

"best"  is  defined  by  the  uniform  (or  Chebyshev)  norm.   That  is,  with 

|f||  =  maXrJf  (t )  | ,  a  best  approximation  f  satisfies 

| |f-g| |  =   inf  | |h-g| |. 
heFn+ 

On  a  compact  interval  best  approximations  to  continuous  functions 
from  F   are  guaranteed  to  exist  and  to  be  unique  by  a  theorem  due  to 
Braess  [3].   On  a  finite  point  set,  however,  the  situation  is  more  compli- 
cated.  Consider  the  following  example.   Suppose  we  wish  to  find  a  best 

+ 
approximation  from  F   to  f  given  by: 


t 

f(t) 

0 

1 

1 

-0.2 

2 

0.1 

A  best  approximation  should  be  characterized  by  a  3-point  alternant.   That 
is,  we  want  to  find  a,  b,  d  satisfying  the  system 

a  exp  (bt.  )  +  (-l)1d  =  f(t.) 

1  X      i  =  1,  2,  3. 

Adding  together  the   first  two  and  the  last  two   equations,  we  get 

a{exp(bt1)   +  exp(bt2)}   =  fO^)   +  f(tg) 

a{exp(bt2)   +  exp(bt3)}   =  f(tg)    +   f(%3), 
or 


a  exp(bt    ){1  +  exp(b)}   =   f(t    )   +   f(t2) 

a  exp(bt2){l  +  exp(b)}   =   f(tg)   +   fftg). 
Division  of  the  second  of  these  by  the   first  yields 

exp(b)   =   (f(t2)   +   f(t3)}/{f(t1)   +   f(t2)}. 
The  left   side  of  this   equality   is  positive,   but  the  right   side   is   nega- 
tive  for  the   given   data;   hence  a  best   approximation   can  not  be  constructed. 

Indeed,   what  happens    is  that   as  b-*-- °°,       |  f-exp(bt )  |  |-MD.2,   but  this   limit 

+ 
is   never  attained  by  any  member  of  F      . 

This   difficulty  may  be   gotten  around   [3]  by   redefining  F       to   in- 
clude a  boundedness   condition    |b.|<M.      In  computational  work,   there   is   an 
automatic   bound  imposed  by  the  computer;    i.e.    one  would  expect   that   an 
attempt  to  compute  a  best   approximation   in  the  example  above  would  lead 
to  an  overflow.      It   is  preferable,    of  course,   to  build  a   smaller  bound 
into  the  computer  program. 

Keeping  in  mind  this  theoretical  difficulty,  we  shall  henceforth 
assume  that  the  particular  best  approximations  that  we  wish  to  compute 
exist   and  are  unique. 


3.      THE  ALGORITHM 

n 

Let   f  (an  ,    .  . .  ,    a   ,   b_  ,    ...,   b    ;    t)   =      /    a.exp(b.t)   and  for  simplic- 
1  n        1  n  . L^    11  ^ 

1=1 

ity  write  this  as  f(A;  t),  where  A  denotes  the  parameter  vector: 

A  =  (a  ,  ...,  a  ,  b  ,  ...,  b  ).   Let  Vf  represent  the  gradient  of  f  with 

respect  to  A,  i.e. 

1    '  ^9a  '  ••'»  9a  '  9b  »  '•'  9b  ;* 

1         n    1         n 

Then,  if  the  parameters  are  changed  by  a  small  amount  6A,  to  a  good  ap- 
proximation one  has 

(1)     f(A  +  6A;  t)  :  f(A;  t)  +  (Vf(A;  t),  5A), 
where  the  argument  A  in  the  gradient  indicates  that  the  partial  deriva- 
tives are  evaluated  at  A.   Let 

E(A;  t)  E  max  |g(t)  -  f(A;  t)|, 
teT 

the  error  of  approximating  g  by  f(A).   Then  the  error  of  approximating 
g  by  f(A+6A)  can  be  estimated  as  follows: 

E(A+6A;  t)  E  max  |g(t)  -  f(A  +  6A;  t)| 
teT 

=  max  |g(t)  -  f(A;  t)  +  f(A;  t)  -  f(A  +  6A;  t)| 
teT 

:  max  |E(A;  t)  -  (Vf(A;  t),  5A)|. 
teT 

The  algorithm  then  proceeds  as  follows: 

1.  Begin  with  an  initial  approximation  f(An;  t).   Then  for 
k  =  0,  1,  ...  : 

2.  Minimize  max  |E(A^;  t)  -  (Vf(A  ;  t),  6A) | 

teT     K  K 

=   e(A;  6A) 

over  all  vectors  6A  e  R   .   (This  step  will  be  discussed  in  detail  later.) 

-I 

3.  Using  the  minimizing  6A  from  Step  2,  form  A    =  A  +  2   6A, 


where  I   is  the  smallest  nonnegative  integer  such  that 

l|E(Ak+1)||  1  |  lE(A^)  I  |  -  i-2-*{||E(Ak)||  -  e(A^;    6A)}. 

This  condition,  or  something  like  it,  is  needed  in  order  to  guard  against 
making  such  large  changes  in  the  parameter  vector  that  the  approximation 
(l)  is  invalid.   (Using  this  condition,  Braess  [h]    is  able  to  prove  a 
limited  sort  of  convergence  theorem  for  the  basic  scheme  outlined  here. ) 

k.      If  I  |e(A  )  I  I  -  I  |e(A^   )  I  I  <_  r  |  |e(A   )  I  I  ,  where  r  is  some  small 
number  (10  '  in  the  present  implementation),  terminate  the  program  and 
return  A    as  the  "best"  approximation  parameter  vector.   Otherwise,  con- 
tinue  to  iterate.   (Go  back  to  Step  2. ) 

As  noted  by  Braess  [h]9    Step  2  is  a  linear  approximation  problem 
(i.e.  approximation  from  the  linear  subspace  spanned  by  the  components  of 
the  gradient  of  f),  and  therefore  it  is  generally  solvable  by  standard 
techniques.   Presumably  Braess  uses  some  technique  along  the  lines  of  the 
Remez  algorithm,  which  works  by  making  successive  approximations  to  the 
extremal  points,  since  he  mentions  difficulties  with  extremal  points.   To 
avoid  such  difficulties,  our  algorithm  uses  linear  programming  for  Step  2. 
The  programming  problem  is  set  up  as  follows.   First,  we  get  rid  of  the 
absolute  value  by  doubling  the  number  of  expressions;  i.e.  we  minimize 

max{E(Ak;  t)  -  (Vf(A  ;  t),  6A),  -  E(Afc;  t)  +  (Vf(A  ;  t),  6A)>. 

L>  c  -L 

This  is  readily  converted  to  the  standard  linear  programming  formulation: 
Maximize  -M  over  all  6A,  M  such  that 

(Vf(Ak;  t),  6A)  -  M  <  E(A^-    t)      (teT) 
-(Vf(Ak;  t),  6A)  -  M  <  -E(Afc;  t)     (teT) 
and       -M  <_  0 . 
If  L,  the  number  of  points  in  T,  is  large,  this  appears  to  be  a  very  large 


LP  problem.   We  notice,  however,  that  this  problem  is  identical  to  the 
"dual"  problem  of  the  standard  form  pair  [T,  p.  ^T]»  so  we  convert  to  the 

primal  (dual  of  the  dual)  problem: 

L 
Minimize   J   (E(A^;  t.)X  -  E(A^;  t±)\+±) 


i=l 


with  constraints 

L 


J=i    ([Vf(Ak;  t.)]^.-  [Vf(Ak;  t.)]^.) 


=  0, 


j  =  1,  2,  ...  2n, 


2L 

I 

i=l 


-I     Xi  "  X2L+1  "  -1" 


and  X.  >  0  for  all  i. 

l  — 

(The  notation  [  ].  denotes  the  j   component  of  the  vector.) 

J 

Notice  that  by  dualizing  we  transform  from  a  problem  involving  2n+l 
variables  (M  plus  the  components  of  6A)  and  2L+1  constraints  to  one  with 
2L+1  variables  (X  ,  ...,  X    )  and  2n+l  constraints.   It  is  advantageous 

X  CLJ-I+J- 

to  have  the  number  of  constraints  be  the  smaller  number.   No  extra  work 
was  required  to  obtain  the  desired  values  of  M  and  6A,  since  in  the  linear 
programming  algorithm  used  the  dual  solution  automatically  is  generated 
in  the  course  of  the  computation. 

The  linear  programming  routine  used  was  an  implementation  of  a 
version  of  the  simplex  method  based  on  Cholesky  factorization  (where  the 
basis  is  expressed  as  a  product  of  a  lower  triangular  and  an  orthogonal 
matrix).   This  factorization  is  more  stable  than  the  commonly  used  "pro- 
duct form  of  the  inverse".   Details  of  the  method  may  be  found  in  [8,  9]. 

+ 
Notice  that  although  approximation  is  to  be  from  the  set  F   ,  the 

algorithm  contains  no  provision  for  restricting  the  coefficients  a.  to 

positive  values.   Braess  suggests  doing  this  by  increasing  the  integer  £ 


(see  Step  3)  if  necessary.   We  have  opted  to  keep  the  computer  program 
more  general,  allowing  the  coefficients  to  become  negative.   A  theoretical 
difficulty  with  respect  to  existence,  even  for  approximation  on  compact 
intervals,  occurs  in  this  case.   But  it  is  interesting  to  see  how  the 
algorithm  handles  such  troublesome  situations,  and  the  coefficients  should 

automatically  remain  positive  for  curves  which  are  reasonably  fitted  by 

+ 
members  of  F 
n 


k.      COMPUTER  TESTS 

To  see  that  the  algorithm  was  working  properly,  and  also  to  get  an 
idea  of  its  efficiency,  we  first  did  a  number  of  one-term  (n=l)  exponen- 
tial fits,  using  as  data  the  same  simple  polynomials  which  were  previously 
fitted  using  a  Remez-type  algorithm  [l].   Runs  were  made  in  Fortran  G  on 
the  University  of  Illinois'  366/75  computer.   Convergence  was  fairly  rapid 
in  all  cases.   Generally  five  or  six  iterations  were  required  before  the 
stop  test  was  met.   It  should  be  noted,  however,  that  the  parameters  were 
initially  taken  to  be  zero  (A  =  (0,  0)).   On  the  first  iteration,  a  best 
constant  approximation  was  obtained;  the  second  iteration  produced  a  non- 
zero exponential  factor.   Thus  had  we  started  with  a  good  initial  guess 
we  probably  could  have  cut  down  the  number  of  iterations  by  two.   In  com- 
paring the  timing  (see  Table  l)  this  fact  should  be  kept  in  mind.   The 
Remez  algorithm  did  start  with  a  good  initial  approximation;  hence  some 
8  to  10  cs.  should  be  subtracted  from  the  LP  times  for  a  valid  comparison. 
(The  time  for  each  iteration  step  averaged  just  under  6  cs.;  however  the 
first  two  steps  were  faster,  being  never  more  than  5  cs.)   The  stopping 
criteria  were  also  not  precisely  the  same;  the  effect  of  this  is  felt  to 
be  in  the  direction  of  again  adding  time  to  the  LP  runs.   Looking  at 
Table  I  we  see  that  even  when  these  effects  are  allowed  for  the  LP  method 

generally  takes  four  or  five  times  as  long  as  the  Remez  method.   An  in- 

2 
teresting  anomalous  case  was  that  of  the  function  x  -x+3,  where  the  times 

were  roughly  comparable  —  unusually  long  among  the  Remez  runs  and  un- 
usually short  among  the  LP  runs.   (This  example  demonstrates  rather  well 
the  futility  of  trying  to  make  tight  a   priori  timing  estimates  for  such 
iterative  processes.)  We  wish  to  emphasize  that  the  time  differential  was 


TABLE  I 

Efficiency  Comparison.   Curve  fitting  by  aexp(bt).   Func- 
tions defined  at  20  equally-spaced  points  on  [0,  l].   Times 
are  given  in  centiseconds . 


Function 

Remez  Times 

LP  Times    [No. 

of  iterations ] 

■  -3x  +  5 

9 

kk 

[5] 

-3x  +  k 

5 

33 

[5] 

-3x2  +  h 

7 

Uo 

[6] 

-3x3  +  5 

10 

U6 

[6] 

-2x  +  T 

8 

32 

[5] 

-3x  +  6 

5 

1+1 

[5] 

7 

2 

111 

[2] 

x2-x  +  2 

5 

15 

[2] 

x2-5x  +  2 

5 

111 

[5] 

x2-x  +   3 

10 

19 

[2] 

2x2-3x  +  2 

6 

36 

[5] 

-x  +  2 

7 

32 

[5] 

Ux2-7x  +  U 

8 

35 

[6] 

x2-2x  +  2 

7 

31 

[5] 

not  unexpected.   Remez-type  algorithms  are  known  to  be  highly  efficient, 
converging  quadrat ically.   The  fact  that  the  time  differential  was  no 
greater  than  it  was  is  quite  encouraging,  since  it  indicates  that  the  LP 
approach  should  not  be  hopelessly  time-consuming  when  applied  to  multi- 
term  exponential  approximation. 

One  further  comparison  was  made  with  the  results  of  [l].   The  pre- 
vious report  [l]  dealt  basically  with  a  problem  of  simultaneous  approxima- 
tion —  specifically  the  problem  of  minimizing; 

max  max  |g.(t)  -  a.exp(bt)| 
i  teT   X  X 

where   g   ,    g   ,    ...,    g     are   a  given   set   of  curves  to  be   fitted  by   exponen- 
tials with  a  common   exponential   factor  b.      Obviously,    this   problem  can  be 
set   up  just   as  the   single-curve   approximation  problem,    differing  only   in 
that  the  LP  step  on  each   iteration  is  now  m  times   as  large  as   for   a  single 

curve.      Instead  of  two  unknowns  we  have  m+1    (the  parameters   a.,  ,    ...,   a    , 

1        m 

b),  while  the  number  of  constraints  has  increased  essentially  to  m.  *  L  (L 
points  of  evaluation  for  each  of  the  m  curves). 

In  [l]  the  method  proposed  for  handling  this  problem  involved  con- 
struction of  a  best  approximation  to  each  curve  individually,  followed  by 
construction  of  a  best  simultaneous  approximation  to  each  pair  of  curves. 
These  constructions  were  done  by  Remez-type  methods.   It  was  suggested  in 
[l]  that,  even  though  many  separate  iterations  were  required,  the  require- 
ment that  no  more  than  two  curves  be  handles  at  once  should  effect  some 
savings  over  any  scheme  which  handles  all  data  simultaneously.   To  look 
further  into  this  question,  we  generalized  our  program  to  handle  simul- 
taneous curve  fitting  as  described  in  the  prececeding  paragraph.   The  set 
of  functions  fitted  consisted  of  the  first  seven  functions  from  Table  I. 


10 


The  method  described  in  [l]  was  about  three  times  as  fast  as  that  of  iter- 
atively  solving  large  linear  programming  problems  —  even  though  only  five 
such  iterations  were  required!   (Specifically  the  times  were  338  vs_.  983 
cs.  )   In  addition,  the  Remez  approach  supplies  a  good  bit  of  extra  useful 
information,  such  as  all  of  the  individual  best  approximations  and  which 
individual  curve  or  pair  of  curves  is  critical  in  the  sense  of  character- 
izing the  best  value  of  b.   Also,  after  the  best  b  is  identified,  the  best 
coefficients  (for  that  b)  are  computed  for  each  curve.   (The  simultaneous 
linear  programming  method  does  not  change  coefficients  which  have  no  ef- 
fect on  the  maximum  error. ) 

The  factor  of  three  difference  in  times  depends  not  only  on  the 
particular  functions  fitted  but  also  on  their  number  (m).   Most  of  the 
work  in  the  Remez  method  for  m  individual  curves  consists  of  identifica- 
tion of  extremal  points,  which  requires  effort  proportional  to  m.   But 
pairwise  comparisons  and  computation  of  best  pairwise  approximations  must 
also  be  carried  out,  making  the  overall  work  more  nearly  proportional  to 
the  number  of  pairs  (m(m-l)).   Even  though  the  number  of  pairs  for  which 
a  simultaneous  approximation  must  be  computed  is  generally  less  than 
m(m-l )  (e.g.  l6  instead  of  1+2  for  the  T-curve  example),  the  work  is  essen- 
tially  0(m  ).   On  the  other  hand,  the  work  to  solve  the  comparable  linear 
programming  problem  with  (m+l)  unknowns  is  0(m+l)  .   Hence  as  m  increases 
it  would  seem  increasingly  advantageous  to  use  the  Remez  approach.   The 
number  L  of  points  in  the  set  T  (the  grid  points)  is  also,  of  course,  an 
important  factor;  the  necessity  of  continually  sweeping  the  grid  for  ex- 
tremal points  should  cause  the  Remez  method  to  be  more  severely  affected 
by  increasing  L  than  is  the  LP  method. 
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Only  a  limited  number  of  tests  of  the  program  for  multi-term  (n>l) 
fitting  have  "been  run  to  date.   We  have  used  as  test  data  the  function 
g(t)  =  l/(l+t)  on  [0,  l].   This  example  has  previously  been  used  "by  Braess 
[h]    and  is  well-behaved  in  the  sense  that  Braess  seemed  to  have  no  diffi- 
culty in  computing  best  approximations  for  n<5  and  all  of  these  turn  out 
to  have  all  positive  coefficients  and  all  negative  exponential  factors, 
without  any  sign  restrictions  having  been  imposed  a_  priori . 

The  program  worked  well  for  n=2.   Beginning  with  initial  values 
a  =a  =b  =b  =0,  it  required  twelve  iterations  to  obtain  a  "best"  approxima- 
tion comparable  to  Braess'  (||e|   <  2.07  x  10   ).   As  in  single-term  fit- 
ting, the  first  few  iterations  were  needed  to  develop  a  nonzero  initial 
approximation.   When  the  more  realistic  values  a  =a  =3/8;  b  =b  =0  were 
used  as  initial  data,  only  six  iterations  were  required  to  reach  the  same 
degree  of  approximation.   (This  choice  of  initial  values  was  inspired  by 
the  fact  that  then  a  +a  =3A,  "the  best  constant  approximation  to  l/(l+t) 
on  [0,  1].) 

Convergence  is,  as  expected,  sensitive  to  the  choice  of  initial 
values.   Starting  with  the  poor  values  a=a  =1;  b  =b  =0,  after  25  itera- 
tions the  program  had  reached  (rounded)  values  of  a  =1.06,  b  =-.05, 
a  =-.09,  b  =-.63,  ||e||=.32.   (CF.  correct  values:   a  =.7lkt    a  =.286, 
b  =-.U07,  b  =-2.  Ui+3.  )   Changes  appeared  to  be  more  or  less  in  the  correct 
direction,  but  very  slow  ( £=12  or  13  in  Step  3).   The  problem  seemed  to  be 
that  the  program  quickly  jumped  to  a  small  negative  a  on  the  first  iter- 
ation and  was  unable  to  recover  from  this  false  step. 

Braess  [h]   chooses  his  initial  values  as  follows.   Let  a.,  ,  ...,  a  n, 

J  1'  n-1 ' 

b    ,    ...,   b  tie  the  parameters   for  the  best    (n-l)-term  approximation. 
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Then   choose  as   initial   values    for  the  n-term  approximation  a.=a.,   b.=b. 

(i   =  1,    2,    ...,    n-l):    a  =0,   h      some  value  other  than  the  b.'s.      Several 

n     n  l 

different  values  for  h  may  need  to  be  tried  before  convergence  is  ob- 
tained.  Braess  suggests  trying  in  turn  the  values 


then 


(b.  +  t>i+1)/2,   i  =  1,  2,  ...,  n-2, 


b.,  -b  and  b  ,+b,  where  b  = 
1        n-l 


max{T}  -  min{T} 

In  this  example,  b=3,  and  the  best  one-term  approximation  is  0.977exp(-0. 71 5t ) 
Thus  Braess  would  have  made  (at  most)  two  runs  starting  from  the  following 
sets  of  initial  values. 


Run 

ai 

a^ 
2 

\ 

b2 

I 

0.977 

0 

-0.715 

-3.715 

II 

0.977 

0 

-0.715 

2.285 

Using  these  initial  values,  we  obtained  convergence  (i.e.   |E|   < 
2.07  x  10   )  in  7  iterations  for  Run  I  —  more  than  were  required  when  we 
started  with  reasonable  constant  approximations  (zero  exponential  factors).' 
Predictably,  Run  II  did  not  converge  —  at  least  not  in  a  reasonable 
length  of  time.   After  28  iterations  the  unrealistic  positive  value  of  b 
had  slowly  but  steadily  decreased  to  1.1.   It  is  interesting  that,  from 

the  output  to  Step  2  on  each  iteration,  a  large  change  to  negative  b  was 

-7 
indicated,  but  this  change  was  cut  down  by  2   by  the  condition  invoked 

in  Step  3.   It  may  be  that  this  condition,  though  appearing  theoretically 

desirable,  is  antiproductive  in  some  cases.   Braess,  by  the  way,  does  not 

provide  computational  details,  but  only  indicates  that  he  got  convergence 

to  the  best  approximation  for  some  run. 
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In  order  to   investigate  whether  the  test    in  Step   3  may   indeed  some- 
times  cause  trouble,   we  computed  the  values   of  the  parameters   in  Run  II 
which  would  have  resulted  after  the  third   iteration,    if  the   full   correc- 
tions  had  been  made.      These  values  were  a     =  0.891,    a     =  0.1073, 
b     =  -1.1708,    and  b     =  -2.ll+2l+.      We  then  restarted  the  program  with   these 
values   as    initial   values.      As  we  had  hoped,    convergence   did  occur,    al- 
though  22   iterations  were  required.      How  one  can  recognize  when  a  large 
jump    (to   a  larger      |E| |)   may  be  advisable   is   a  problem  needing  further 
study. 

For  the  three-term  best   approximation  Braes s   obtains 

f(t)   =  O.Ol+59exp(-U.507t)   +  0.  31+9exp(-1.601t )   +  0  .  560exp(-0  .287t ) , 

-6 
with      |e||    =  1.83  x  10      .      Notice  that  the   exponential   factors    change 

drastically   from  step  to   step.      This   not   only  makes  the   choice  of  initial 
values    difficult,    but   also  makes   one  wonder  about   the  validity  of  applying 
such   curve   fitting  to   experimental   data  unless   n   is   known  a  priori . 

We   first  tried  the   3-term  approximation  using  initial   data  analogous 
to   a  set  that  worked  well   in  the  2-term  case;    namely  a..    =  ap  =   a,.   =  0.25, 
b     =  b      =  b     =  0.      Convergence  was   not   obtained  in  this   case,    however; 
nor  was   it   obtained  when  we  tried  to  bootstrap  up  from  all   zero  parameters. 
It   seems   clear  that   as   the  number  of  terms    increases    it  becomes    increas- 
ingly  important  to  begin  with   reasonable  values   for  the   exponential   fac- 
tors b..      Braess'    procedure   for   choosing  initial  values   leads  to  the 
following  three  sets. 


Run 

al 

a2 

a3 

bl 

b2 

b3 

I 

0.71** 

0.286 

0 

-0.1+07 

-2.1+1+3 

-1.1+25 

II 

0.71^ 

0.286 

0 

-0.1+07 

-2.1+1+3 

-3.1+07 

III 

O.'Jlk 

0.286 

0 

-0.1+07 

-2.1+1+3 

+0.557 

11+ 


Run  I   converged  in   6   iterations   to      |e| |    =  1.775   x   10      .      This    is   smaller 
than  Braess'    value  of  1.83  *  10      .      Our  final  parameter  values   also   dif- 
fered from  his,    as   shown  in  Tahle   II. 


Table   II 

Final  parameter  values.      Fit   of  l/(l+t)  by  a   3-term  exponen- 
tial. 

al  a2  a3  \  b2  b3 


Braess  [h] 
Present  work 


0.01*59  0.3^9  0.560  -U. 507  -1.601  -0.287 
O.OI160  0.39^  O.560  -U.50U  -I.60U  -0.287 


The  only  substantial  difference  is  in  a  ,  and  it  looks  suspiciously  like 
Braess'  value  contains  a  typographical  error.   The  small  differences  in 

|e|   and  in  the  other  parameters  are  probably  due  to  differences  in  the 
discrete  grids  used. 

Rather  surprisingly,  Run  II  failed  to  converge.   The  successive 
iterates  seemed  to  be  tending  towards  the  two-term  approximation,  with  a 
spurious  third  term  having  small  negative  coefficients  and  large  positive 
exponential  factors  —  ultimately  causing  an  exponent  overflow  error. 
Run  III,  which  contained  a  positive  exponential  factor  among  the 
initial  values,  behaved  very  similarly  to  the  analogous  run  (il)  in  the 
2-term  case.   Changes  in  the  parameters  seemed  in  the  correct  direction, 
but  were  constrained  to  be  too  small  (l   =  6  or  7)  for  convergence  in  any 
reasonable  length  of  time. 

It  should  be  noted  that,  with  the  minimum  | |e|   —  hence  also  the 

-6 
minimum  found  by  the  LP  routine  —  so  small  (-10   ),  obtaining  conver- 
gence for  the  three-term  fit  required  some  careful  adjustment  in  the 
tolerances  contained  within  the  LP  routine.   If  too  large  a  tolerance  is 
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used  in  choosing  pivots,  the  LP  routine  does  not  run  to  the  minimum.   On 
the  other  hand,  if  too  small  a  tolerance  is  used  in  the  feasibility  test, 
the  program  decides  that  the  problem  is  infeasible.   A  scheme  for  auto- 
matic adjustment  of  tolerances  to  avoid  these  problems  would  be  a  highly- 
desirable  feature  for  the  LP  routine. 

A  limited  amount  of  experimentation  has  been  done  on  cases  for  which 
best  approximations  do  not  exist.   (The  set  of  multi-term  exponentials  is 
not  closed  but  has  as  limit  points  polynomials  and  polynomials  multiplied 
by  exponentials.)   For  example,  the  function  1-t  is  the  limit  of  a  se- 
quence of  two-term  exponentials 

(-  j)exp(6t)  +  (l+|-) 
as  S^K) .   One  would  therefore  expect  the  program  to  generate  large,  almost 

cancelling,  coefficients,  and  exponential  factors  approaching  zero.   Large 

-I 
changes  in  the  parameters,  however,  are  avoided  because  of  the  2  "  factor 

in  Step  3.   Thus  what  tends  to  happen  is  that  I   becomes  large  and  the 
actual  parameter  changes  become  small  and  ineffectual.   Details  of  two 
runs,  as  described  below,  show,  however,  that  the  behavior  is  in  some  re- 
spects difficult  to  predict. 

Since  the  best  one-term  approximation  to  1-t  is  1 .128exp(-2.177t ) , 
following  Braess'  scheme  we  made  two  runs  from  the  following  initial 
values. 

Run  a  a  b  b 


I 
II 


1.128         0.0  -2. ITT  -5. ITT 

1.128  0.0  -2. ITT  0.823 


In  Run   I,    on  the   first   iteration  only   small    changes  were  produced  by  the 
linear  programming  step.      But  on  the   next    iteration  the  LP  output  was 
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Aa     =  156,    Aa     =  -156,    Ab     =  -27,    Ab     =  65.      Note  that  the   coefficient 
changes   are  as  predicted  —  large  and  of  opposite   signs.      The   changes   in 
the  exponential   factors   seem  a   spurious   result  of  the  linearization.      Also 
predictably,    the   £  became  large  —  so  large  that  the      |E|      changed  so 
little  that  the  program  decided  that  the  convergence  test  was   satisfied! 
In  Run   II,    I  never  became   greater  than   five,    and  the  program  seemed  to 
be  making  slow  but   steady  progress   in  the   right   direction.      After  63   iter- 
ations  the  approximation  was 

39.91exp(-0.0125t)   -   38.91exp(0.012Ut ) . 
A  program  set   up  to   recognize  when  the  exponential   factors  were  approaching 
each  other  and  switch  to  the  proper  limiting  form  would  clearly  be   use- 
ful  for  handling  such  cases. 


17 


5.   CONCLUSIONS 

We  feel  that  the  experiments  described  here  are  very  encouraging. 
At  the  same  time  we  have  noticed  several  problems  requiring  further  study- 
before  the  program  will  be  attractive  for  widespread  use.   These  problems 
include  the  following. 

i.   As  noted  in  the  previous  section,  some  adjustment  of  tolerances 
had  to  be  made  before  a  3-term  exponential  fit  was  successfully  accomplished, 
For  different  data  or  more  terms,  further  adjustments  might  be  needed. 
It  is  essential  that  the  tolerances  within  the  linear  programming  routine 
should  automatically  adjust  to  the  requirements  of  the  particular  problem 
being  run. 

ii.   On  successive  iterations  the  linear  programming  routine  seems 
to  spend  a  considerable  amount  of  time  repeating  the  same  basis  changes. 
Shortcutting  this  process  by  carrying  over  some  information  from  one  iter- 
ation to  the  next  should  lead  to  a  large  increase  in  efficiency. 

iii.   Further  study  needs  to  be  made  of  the  role  of  the  parameter  I. 
A  "large"  I   may  be  a  useful  indicator  of  nonconvergence  —  but  the  defi- 
nition of  "large"  is  difficult  when  convergent  runs  reached  I   values  as 
large  as  9  and  in  some  "nonconvergent"  runs  £  never  was  larger  than  7^ 
Also,  as  previously  discussed,  a  large  jump  to  a  different  region  of  the 
parameter  space  is  occasionally  helpful  even  if  the  test  in  Step  3  is 
violated. 

Finally,  it  should  be  emphasized  that  the  method  of  construction  of 
best  approximations  discussed  here  whould  be  applicable  to  curve  fitting 
by  a  wide  variety  of  functional  forms.   In  fact,  essentially  all  that 
need  be  done  in  the  present  program  to  alter  the  functional  form  is  to 
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write  a  subroutine  to  compute  that  function  and  its  gradient  components, 
But  the  need  for  studying  the  three  problems  listed  above  becomes  even 
more  acute  if  we  hope  to  produce  a  reliable,  efficient,  general-purpose 
curve- fit ting  program. 
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